Closed Bug 1922347 Opened 8 months ago Closed 7 months ago

crash at [@ nsSplittableFrame::GetFirstInFlowIfCached]

Categories

(Core :: Layout, defect)

defect

Tracking

()

VERIFIED FIXED
133 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox-esr128 --- unaffected
firefox131 --- wontfix
firefox132 --- wontfix
firefox133 --- fixed

People

(Reporter: tsmith, Assigned: TYLin)

References

(Blocks 1 open bug, Regression)

Details

(4 keywords, Whiteboard: [bugmon:bisected,confirmed])

Attachments

(1 file)

Attached file testcase.html

Found while fuzzing m-c 20240804-f19c35b800f5 (--enable-debug --enable-fuzzing)

To reproduce via Grizzly Replay:

$ pip install fuzzfetch grizzly-framework --upgrade
$ python -m fuzzfetch -d --fuzzing -n firefox
$ python -m grizzly.replay.bugzilla ./firefox/firefox <bugid>

This seems to reproduce with a debug build but not an ASan build.

==35931==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address (pc 0x7c6650b398c2 bp 0x7ffcde8ab880 sp 0x7ffcde8ab870 T35931)
==35931==The signal is caused by a READ memory access.
==35931==Hint: this fault was caused by a dereference of a high value address (see register values below).  Disassemble the provided pc to learn which register was used.
    #0 0x7c6650b398c2 in nsSplittableFrame::GetFirstInFlowIfCached() const /builds/worker/checkouts/gecko/layout/generic/nsSplittableFrame.cpp:184:3
    #1 0x7c6650b1a00d in nsSplittableFrame::FirstInFlow() const /builds/worker/checkouts/gecko/layout/generic/nsSplittableFrame.cpp:190:31
    #2 0x7c6650b39144 in nsSplittableFrame::UpdateFirstContinuationAndFirstInFlowCache() /builds/worker/checkouts/gecko/layout/generic/nsSplittableFrame.cpp:270:44
    #3 0x7c6650b38d13 in nsSplittableFrame::RemoveFromFlow(nsIFrame*) /builds/worker/checkouts/gecko/layout/generic/nsSplittableFrame.cpp
    #4 0x7c6650b38c62 in nsSplittableFrame::Destroy(mozilla::FrameDestroyContext&) /builds/worker/checkouts/gecko/layout/generic/nsSplittableFrame.cpp:42:5
    #5 0x7c6650a5e1db in nsContainerFrame::Destroy(mozilla::FrameDestroyContext&) /builds/worker/checkouts/gecko/layout/generic/nsContainerFrame.cpp:301:22
    #6 0x7c6650b1c11e in nsLineBox::DeleteLineList(nsPresContext*, nsLineList&, nsFrameList*, mozilla::FrameDestroyContext&) /builds/worker/checkouts/gecko/layout/generic/nsLineBox.cpp:369:14
    #7 0x7c6650a23159 in nsBlockFrame::Destroy(mozilla::FrameDestroyContext&) /builds/worker/checkouts/gecko/layout/generic/nsBlockFrame.cpp:463:3
    #8 0x7c6650a5df63 in DestroyFrames /builds/worker/checkouts/gecko/layout/generic/nsFrameList.cpp:36:12
    #9 0x7c6650a5df63 in nsContainerFrame::Destroy(mozilla::FrameDestroyContext&) /builds/worker/checkouts/gecko/layout/generic/nsContainerFrame.cpp:234:11
    #10 0x7c6650b1c11e in nsLineBox::DeleteLineList(nsPresContext*, nsLineList&, nsFrameList*, mozilla::FrameDestroyContext&) /builds/worker/checkouts/gecko/layout/generic/nsLineBox.cpp:369:14
    #11 0x7c6650a23159 in nsBlockFrame::Destroy(mozilla::FrameDestroyContext&) /builds/worker/checkouts/gecko/layout/generic/nsBlockFrame.cpp:463:3
    #12 0x7c6650b1c11e in nsLineBox::DeleteLineList(nsPresContext*, nsLineList&, nsFrameList*, mozilla::FrameDestroyContext&) /builds/worker/checkouts/gecko/layout/generic/nsLineBox.cpp:369:14
    #13 0x7c6650a23159 in nsBlockFrame::Destroy(mozilla::FrameDestroyContext&) /builds/worker/checkouts/gecko/layout/generic/nsBlockFrame.cpp:463:3
    #14 0x7c6650b1c11e in nsLineBox::DeleteLineList(nsPresContext*, nsLineList&, nsFrameList*, mozilla::FrameDestroyContext&) /builds/worker/checkouts/gecko/layout/generic/nsLineBox.cpp:369:14
    #15 0x7c6650a23159 in nsBlockFrame::Destroy(mozilla::FrameDestroyContext&) /builds/worker/checkouts/gecko/layout/generic/nsBlockFrame.cpp:463:3
    #16 0x7c6650a5df63 in DestroyFrames /builds/worker/checkouts/gecko/layout/generic/nsFrameList.cpp:36:12
    #17 0x7c6650a5df63 in nsContainerFrame::Destroy(mozilla::FrameDestroyContext&) /builds/worker/checkouts/gecko/layout/generic/nsContainerFrame.cpp:234:11
    #18 0x7c6650b1c11e in nsLineBox::DeleteLineList(nsPresContext*, nsLineList&, nsFrameList*, mozilla::FrameDestroyContext&) /builds/worker/checkouts/gecko/layout/generic/nsLineBox.cpp:369:14
    #19 0x7c6650a23159 in nsBlockFrame::Destroy(mozilla::FrameDestroyContext&) /builds/worker/checkouts/gecko/layout/generic/nsBlockFrame.cpp:463:3
    #20 0x7c6650a40cc3 in nsBlockFrame::DoRemoveFrame(mozilla::FrameDestroyContext&, nsIFrame*, unsigned int) /builds/worker/checkouts/gecko/layout/generic/nsBlockFrame.cpp:7069:20
    #21 0x7c6650a40460 in nsBlockFrame::RemoveFrame(mozilla::FrameDestroyContext&, mozilla::FrameChildListID, nsIFrame*) /builds/worker/checkouts/gecko/layout/generic/nsBlockFrame.cpp:6234:5
    #22 0x7c6650964036 in nsCSSFrameConstructor::ContentRemoved(nsIContent*, nsIContent*, nsCSSFrameConstructor::RemoveFlags) /builds/worker/checkouts/gecko/layout/base/nsCSSFrameConstructor.cpp:7550:5
    #23 0x7c665095f4bb in nsCSSFrameConstructor::RecreateFramesForContent(nsIContent*, nsCSSFrameConstructor::InsertionKind) /builds/worker/checkouts/gecko/layout/base/nsCSSFrameConstructor.cpp:8472:7
    #24 0x7c6650964b0d in nsCSSFrameConstructor::MaybeRecreateContainerForFrameRemoval(nsIFrame*) /builds/worker/checkouts/gecko/layout/base/nsCSSFrameConstructor.cpp
    #25 0x7c665095f424 in nsCSSFrameConstructor::RecreateFramesForContent(nsIContent*, nsCSSFrameConstructor::InsertionKind) /builds/worker/checkouts/gecko/layout/base/nsCSSFrameConstructor.cpp:8461:16
    #26 0x7c665091ff90 in mozilla::RestyleManager::ProcessRestyledFrames(nsStyleChangeList&) /builds/worker/checkouts/gecko/layout/base/RestyleManager.cpp:1681:25
    #27 0x7c6650926df4 in mozilla::RestyleManager::DoProcessPendingRestyles(mozilla::ServoTraversalFlags) /builds/worker/checkouts/gecko/layout/base/RestyleManager.cpp:3285:7
    #28 0x7c66508fa025 in mozilla::RestyleManager::ProcessPendingRestyles() /builds/worker/checkouts/gecko/layout/base/RestyleManager.cpp:3371:3
    #29 0x7c66508f9376 in mozilla::PresShell::DoFlushPendingNotifications(mozilla::ChangesToFlush) /builds/worker/checkouts/gecko/layout/base/PresShell.cpp:4384:37
    #30 0x7c66508be86e in FlushPendingNotifications /builds/worker/workspace/obj-build/dist/include/mozilla/PresShell.h:1455:5
    #31 0x7c66508be86e in nsRefreshDriver::FlushLayoutOnPendingDocsAndFixUpFocus() /builds/worker/checkouts/gecko/layout/base/nsRefreshDriver.cpp:2199:31
    #32 0x7c66508bd769 in nsRefreshDriver::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp, nsRefreshDriver::IsExtraTick) /builds/worker/checkouts/gecko/layout/base/nsRefreshDriver.cpp:2782:3
    #33 0x7c66508c6a71 in TickDriver /builds/worker/checkouts/gecko/layout/base/nsRefreshDriver.cpp:368:13
    #34 0x7c66508c6a71 in mozilla::RefreshDriverTimer::TickRefreshDrivers(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp, nsTArray<RefPtr<nsRefreshDriver>>&) /builds/worker/checkouts/gecko/layout/base/nsRefreshDriver.cpp:346:7
    #35 0x7c66508c6970 in mozilla::RefreshDriverTimer::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp) /builds/worker/checkouts/gecko/layout/base/nsRefreshDriver.cpp:362:5
    #36 0x7c66508c680d in mozilla::VsyncRefreshDriverTimer::RunRefreshDrivers(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp) /builds/worker/checkouts/gecko/layout/base/nsRefreshDriver.cpp:952:5
    #37 0x7c66508c5afc in mozilla::VsyncRefreshDriverTimer::TickRefreshDriver(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp) /builds/worker/checkouts/gecko/layout/base/nsRefreshDriver.cpp:862:5
    #38 0x7c66508c4e89 in mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::NotifyVsyncTimerOnMainThread() /builds/worker/checkouts/gecko/layout/base/nsRefreshDriver.cpp:593:14
    #39 0x7c664fd2ec4b in mozilla::dom::VsyncMainChild::RecvNotify(mozilla::VsyncEvent const&, float const&) /builds/worker/checkouts/gecko/dom/ipc/VsyncMainChild.cpp:66:15
    #40 0x7c664ffb53c7 in mozilla::dom::PVsyncChild::OnMessageReceived(IPC::Message const&) /builds/worker/workspace/obj-build/ipc/ipdl/PVsyncChild.cpp:235:78
    #41 0x7c664feeb0b0 in mozilla::dom::PContentChild::OnMessageReceived(IPC::Message const&) /builds/worker/workspace/obj-build/ipc/ipdl/PContentChild.cpp:8260:32
    #42 0x7c664ba2ab5f in mozilla::ipc::MessageChannel::DispatchAsyncMessage(mozilla::ipc::ActorLifecycleProxy*, IPC::Message const&) /builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp:1785:25
    #43 0x7c664ba27ae2 in mozilla::ipc::MessageChannel::DispatchMessage(mozilla::ipc::ActorLifecycleProxy*, mozilla::UniquePtr<IPC::Message, mozilla::DefaultDelete<IPC::Message>>) /builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp:1712:9
    #44 0x7c664ba28762 in mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::ActorLifecycleProxy*, mozilla::ipc::MessageChannel::MessageTask&) /builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp:1503:3
    #45 0x7c664ba298af in mozilla::ipc::MessageChannel::MessageTask::Run() /builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp:1603:14
    #46 0x7c664aea3cd7 in mozilla::RunnableTask::Run() /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:618:16
    #47 0x7c664ae99766 in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:945:26
    #48 0x7c664ae98177 in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:768:15
    #49 0x7c664ae985f5 in mozilla::TaskController::ProcessPendingMTTask(bool) /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:554:36
    #50 0x7c664aea76a9 in operator() /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:271:37
    #51 0x7c664aea76a9 in mozilla::detail::RunnableFunction<mozilla::TaskController::TaskController()::$_1>::Run() /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.h:548:5
    #52 0x7c664aebad6b in nsThread::ProcessNextEvent(bool, bool*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:1155:16
    #53 0x7c664aec1a4f in NS_ProcessNextEvent(nsIThread*, bool) /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp:480:10
    #54 0x7c664ba30693 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) /builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp:107:5
    #55 0x7c664b983ac1 in RunHandler /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:363:3
    #56 0x7c664b983ac1 in MessageLoop::Run() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:345:3
    #57 0x7c6650536248 in nsBaseAppShell::Run() /builds/worker/checkouts/gecko/widget/nsBaseAppShell.cpp:148:27
    #58 0x7c66505e3848 in nsAppShell::Run() /builds/worker/checkouts/gecko/widget/gtk/nsAppShell.cpp:469:33
    #59 0x7c66514ac80b in XRE_RunAppShell() /builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp:710:20
    #60 0x7c664ba31536 in mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) /builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp:235:9
    #61 0x7c664b983ac1 in RunHandler /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:363:3
    #62 0x7c664b983ac1 in MessageLoop::Run() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:345:3
    #63 0x7c66514ac09b in XRE_InitChildProcess(int, char**, XREChildData const*) /builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp:645:34
    #64 0x5931dc95f08e in main /builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp:403:22
Flags: in-testsuite?

The functions UpdateFirstContinuationAndFirstInFlowCache() and GetFirstInFlowIfCached() in the backtrace are a good clue that this is probably an issue with the first-in-flow cache that TYLin recently added (see bug 1903652 and bug 1901515 and related bugs).

TYLin, could you take a look here?

Flags: needinfo?(aethanyc)
Keywords: pernosco-wanted
Severity: -- → S3

The Pernosco session is pending, the Pernosco devs are investigating an issue. I will add the link when it is available. For now I am removing the pernosco-wanted keyword to prevent bugmon from trying to get another session.

Flags: needinfo?(twsmith)
Keywords: pernosco-wanted

Verified bug as reproducible on mozilla-central 20241003213146-ba03fbbe15cf.
The bug appears to have been introduced in the following build range:

Start: dbae452de6c45094f6aa4e50c593d05dd8900baf (20240623020548)
End: 46b26e2046605eeda18e19a69c5f67259a3c8563 (20240623043833)
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=dbae452de6c45094f6aa4e50c593d05dd8900baf&tochange=46b26e2046605eeda18e19a69c5f67259a3c8563

Keywords: regression
Whiteboard: [bugmon:bisected,confirmed]
Flags: needinfo?(twsmith)
Regressed by: 1903652
Flags: needinfo?(twsmith)

Set release status flags based on info from the regressing bug 1903652

A Pernosco session is available here: https://pernos.co/debug/uvjE0FdWAZc0e3UIdWFk5Q/index.html

Flags: needinfo?(twsmith)

The severity field for this bug is set to S3. However, the bug is flagged with the sec-high keyword.
:dholbert, could you consider increasing the severity of this security bug?

For more information, please visit BugBot documentation.

Flags: needinfo?(dholbert)
Severity: S3 → S2
Flags: needinfo?(dholbert)

I'll take a look.

Assignee: nobody → aethanyc
Flags: needinfo?(aethanyc)

I use this command ./mach mozregression -B debug -b 2024-10-02 -g 2024-10-24 --find-fix -a ~/Downloads/1922347.html, where 1922347.html is a local copy of the testcase in comment 0. The testcase no longer crashes after Jonathan's patch in bug 1926512, which caches the first-continuation/first-in-flow directly in nsSplittableFrame. I think the patch might simplify a corner case to remove/update cache in frame property.

Given this bug is sec-high and is currently in release channel, we shouldn't add the testcase as a crashtest until Jonathan's patch reaches release channel for at least four weeks per https://firefox-source-docs.mozilla.org/bug-mgmt/processes/fixing-security-bugs.html#landing-tests

Depends on: 1926512

Looking at the pernosco trace, I found a few things:

  • We were crashing while evaluating the assertion condition in GetFirstInFlowIfCached, here:
nsIFrame* nsSplittableFrame::GetFirstInFlowIfCached() const {
[...]
  nsIFrame* firstInFlow = GetProperty(FirstInFlowProperty());
  MOZ_ASSERT(!firstInFlow || !firstInFlow->GetPrevInFlow(),
             "First-in-flow shouldn't have a prev-in-flow!");

At the point where we crash, this is a valid frame, and so this has a valid property-table, but the firstInFlow that we found from the property table is a pointer to a frame that's been deleted and frame-poisoned. (So our cached first-in-flow has been deleted out from under us, basically.)

This ultimately doesn't matter because the only reason we're asking for the first-in-flow is while calling UpdateFirstContinuationAndFirstInFlowCache -- we're just updating our cache, not calling any methods on the first-in-flow or anything.

But the assertion condition !firstInFlow || !firstInFlow->GetPrevInFlow() does call a virtual function GetPrevInFlow on the first-in-flow, so if the first-in-flow is filled with poison, then we crash (safely) when dereferencing the vtable pointer. It's perhaps a bit troublesome that we're getting an invalid frame as our cached-first-in-flow -- there might be other cases where that could lead to trouble -- but probably in most (all?) such cases, this would just be during frame destruction would at-worst lead to a frame-poisoning crash.

jfkthame's patch removes this whole function (including the assertion whose condition is making us crash), so it makes sense that we're fine after that patch.

So, a few high-level takeaways:

  1. This was only a problem for debug builds (as observed in comment 0) because we had this extra check in an assertion that we ran when looking up the cached first-in-flow pointer, in debug builds (including a virtual function call which crashes if the cached pointer is pointing at a destroyed/poisoned frame).
  2. We can indeed consider this fixed by bug 1926512, because it removed this assertion (and its debug-only check).
  3. I don't think we need to consider this sec-high. The crashy situation here seems to be mitigated-by-frame-poisoning.

Having said that, I'm curious if this is how frame poisoning crashes are expected to show up in UBSan builds, and if it's a problem that frame-poisoning crashes are flagged as undefined-behavior here...

Comment 0 reported this as:

==35931==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address (pc 0x7c6650b398c2 bp 0x7ffcde8ab880 sp 0x7ffcde8ab870 T35931)
==35931==The signal is caused by a READ memory access.
==35931==Hint: this fault was caused by a dereference of a high value address (see register values below).  Disassemble the provided pc to learn which register was used.

It ends with "see register values below" but it doesn't look like we captured those registers (at least not "below" - we did capture pc/bp/sp above, but the logging seems to be referring to values that are meant to come later).

Anyway - in pernosco at least, I can confirm that this is frame-poisoning -- we're making a virtual function call firstInFlow->GetPrevInFlow(), with p *firstInFlow showing that its memory has all been filled with repetitions of 0x7ffffffff0dea7ff (and info vtbl firstInFlow confirms that we've got that as the vtable pointer too). That's the frame-poisoning-address mentioned in earlier bugs e.g. bug 679933 comment 3.

Tyson, do you know if there's a way we can tweak our UBSan configuration to detect that this is in fact frame-poisoning and hence safe? Or, is the fact that we're being flagged as undefined-behavior a signal that this is technically not-safe and frame-poisoning is somehow insufficiently-hardened here?

Flags: needinfo?(twsmith)

Daniel, thank you for the great analysis. Given this bug is mitigated by frame-poisoning, I'll reduced the severity to S3.

Severity: S2 → S3

(In reply to Daniel Holbert [:dholbert] from comment #11)

Tyson, do you know if there's a way we can tweak our UBSan configuration to detect that this is in fact frame-poisoning and hence safe? Or, is the fact that we're being flagged as undefined-behavior a signal that this is technically not-safe and frame-poisoning is somehow insufficiently-hardened here?

This isn't actually a UBSan build it's a fuzzing debug build, I know it's deceiving since the log output says ==35931==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address..., only the UBSan logger is used to report a SEGV (likely due to frame-poisoning from your analysis) . On our other fuzzing build which is ASan+UBSan (labelled ASan) it would be reported as a use after poison but since this is only affects debug builds the issue is not present on the ASan+UBSan builds.

Flags: needinfo?(twsmith)
Group: layout-core-security
Status: NEW → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
Target Milestone: --- → 133 Branch

Verified bug as fixed on rev mozilla-central 20241025155727-a5db02c4fbb5.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.

Status: RESOLVED → VERIFIED
Keywords: bugmon

(In reply to Daniel Holbert [:dholbert] from comment #9)

It's perhaps a bit troublesome that we're getting an invalid frame as our cached-first-in-flow -- there might be other cases where that could lead to trouble -- but probably in most (all?) such cases, this would just be during frame destruction would at-worst lead to a frame-poisoning crash.

This is indeed troublesome, and we can crash in a sort-of-but-not-entirely frame-poisoning way, as observed in bug 1913567 comment 14.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: